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Abstract. - Understanding the structure and evolution of web-based user-object networks is a 
significant task since they play a crucial role in e-commerce nowadays. This Letter reports the 
empirical analysis on two large-scale web sites, audioscrobbler.com and delicto. us, where users are 
connected with music groups and bookmarks, respectively. The degree distributions and degree- 
degree correlations for both users and objects are reported. We propose a new index, named 
collaborative clustering coefficient, to quantify the clustering behavior based on the collaborative 
selection. Accordingly, the clustering properties and clustering-degree correlations are investi- 
gated. We report some novel phenomena well characterizing the selection mechanism of web users 
and outline the relevance of these phenomena to the information recommendation problem. 
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Introduction. — The last decade has witnessed 
tremendous activities devoted to the understanding of 
complex networks [1-5]. A particular class of networks 
is the bipartite networks, whose nodes are divided into 
two sets X and Y, and only the connection between two 
nodes in different sets is allowed. Many systems are natu- 
rally modeled as bipartite networks [6] : the human sexual 
network [7] consists of men and women, the metabolic net- 
work [8] consists of chemical substances and chemical re- 
actions, the collaboration network [9] consists of acts and 
actors, the Internet telephone network consists of personal 
computers and phone numbers [10], etc. In addition to 
the empirical analysis on the above-mentioned bipartite 
networks, great effort has been made in how to charac- 
terize bipartite networks [11-13], how to project bipartite 
networks into monopartitc networks [14-16] and how to 
model bipartite networks [17-20]. 

An important class of bipartite networks is the web- 
based user-object networks, which play the central role in 
e-commerce for many online selling sites and online ser- 
vices sites [21]. This class of networks has two specific 
evolving mechanisms different from the well-understood 
act-actor bipartite networks and human sexual networks. 
Firstly, connections between existent users and objects are 
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Fig. 1: (Color online) Illustration of a small user-object bipar- 
tite network. 



generated moment by moment while this does not happen 
in act-actor networks (e.g., one can not add authors to 
a scientific paper after its publication). Secondly, users 
are active (to select) while objects are passive (to be se- 
lected). This is different from the human sexual networks 
where in principle both men and women are active. In 
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Table 1: The basic properties of the two data sets. N, M and E denote the numbers of users, objects and edges, respectively. 
(k) and (d) are the average user degree and average object degree. C u and Co are the collaborative clustering coefficients 
for users and objects, and for comparison, s and s~ u are the average similarities over all object pairs and over all user pairs, 
respectively. The user selection is considered to be highly clustered since C u 3> s a . 
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Audioscrobblcr . com 


35916 


617900 


5028580 
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0.0267 
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Del.icio.us 


10000 


232658 


1233995 


123.40 


5.30 


0.0338 


4.64 x 10~ 4 


0.0055 


8.10 x 10~ 4 



a word, the user-object networks are driven by selection 
of users while the human sexual networks are driven by 
matches. Bianconi et al. [22] investigated the effects of 
the selection mechanisms of users on the network evolu- 
tion. Lambiotte and Ausloos [23, 24] analyzed the web- 
based bipartite network consisted of listeners and mu- 
sic groups, especially, they developed a percolation-based 
method to uncover the social communities and music gen- 
res. Zhou et al. [15] proposed a method to better mea- 
sure the user similarity in general user-object bipartite 
networks, which has found its applications in personal- 
ized recommendations. Huang et al. [25] analyzed the 
user-object networks (called consumer-product networks 
in Ref. [25]) to better understand the purchase behavior 
in e-commcrcc setting E0. Grujic et al. [26,27] studied the 
clustering patterns and degree correlations of user-movie 
bipartite networks according to the large-scale Internet 
Movie Database (IMDb), and applied a spectral analysis 
method to detect communities in the projected weighted 
networks. They found the monopartite networks for both 
users and movies exhibit an assortative behavior while the 
bipartite network shows a disassortative mixing pattern. 

This Letter reports the empirical analysis on two 
well-known web sites, audioscrobbler.com and del.icio.us, 
where users are connected with music groups and book- 
marks, respectively. Our main findings arc threefold: (i) 
All the object-degree distributions arc power-law, while 
the user-degree distributions obey stretched exponential 
functions, (ii) The networks exhibit disassortative mixing 
patterns, indicating that the fresh users tend to view popu- 
lar objects and the unpopular objects are usually collected 
by very active users, (iii) We propose a new index, named 
collaborative clustering coefficient, to quantify the cluster- 
ing behavior based on the collaborative connections. The 
two networks arc of high collaborative clustering coeffi- 
cients for both users and objects. For the lower-degree 
objects, a negative correlation between the object col- 
laborative clustering coefficient and the object degree is 
observed, which disappears when the degree exceeds the 
average object degree. For audioscrobbler.com, the user 
collaborative clustering coefficient is strongly negatively 
correlated with the user degree, decaying in an cxponen- 



1 Instead of the direct analysis on bipartite networks, Huang et 
al. [25] concentrated on the monopartite networks obtained from the 
bipartite networks. 
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Fig. 2: Distributions of user degrees, which obey the stretched 
exponential form [31,32]. We therefore plot the cumulative 
distribution P(k) instead of p(k) and show the linear fittings 
of log( — logP(fc)) vs. logfc in the insets. 



tial form for low degrees. 

Basic Concepts. — Figure 1 illustrates a small bipar- 
tite network that consists of six users and eight objects. 
The degree of user i, denoted by ki, is defined as the num- 
ber of objects connected to i. Analogously, the degree of 
object a, denoted by d a , is the number of users connected 
to a. For example, as shown in Fig. 1, fcj = d a = 3. 
The density function, p(k), is the probability that a ran- 
domly selected user is of degree k, while the cumulative 
function, P(k), denotes the probability that a randomly 
selected user is of degree no less than k. The nearest neigh- 
bors' degree for user i, denoted by d nn (i), is defined as the 
average degree over all the objects connected to i. For 
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Fig. 3: Distributions of object degrees, which are power-law 
(they can pass the Kolmogorov-Smirnov test with threshold 
quantile 0.9) with exponents obtained by using the maximum 
likelihood estimation [33]. 



example, as shown in Fig. 1, d nn (i) = da+d e +d ~t — I 
The degree-dependent nearest neighbors' degree, dn n (fc) is 
the average nearest neighbors' degree over all the users of 
degree k, that is, (^(k) = (d nn (i))k i= k- Corresponding 
definitions for objects, say p(d), P{d), k nn (a) and k nn (d), 
are similar and thus omitted here. 

The traditional clustering coefficient [29] cannot be used 
to quantify the clustering pattern of a bipartite network 
since it always give a zero value. Lind et al. [11] proposed 
a variant counting the rectangular relations instead of tri- 
adic clustering, which can be applied to general bipartite 
networks. However, this Letter aims at a special class of 
bipartite networks, and thus we propose a new index to 
better characterize the clustering patterns resulted from 
the collaborative interests of users. A standard measure 
of object similarity according to the collaborative selection 

ir-DIfii where r„ 



is the Jaccard similarity 



S a /3 



|r Q Ui>, 

and Tp are the sets of neighboring nodes of a and (3, re- 
spectively. Obviously, s a p = sp a and < s a /3 < 1 for any 
a and (3. For example, as shown in Fig. 1, s a p = spy = | 
and s a7 = i. The collaborative clustering coefficient of 
user i is then defined as the average similarity between 
i's selected objects: C u (i) — k .^._^ J2 a ^/3 s ap-, where 
a and (3 run over all i's neighboring objects. For exam- 
ple, as shown in Fig. 1, the collaborative clustering co- 
efficient of user i is C u (i) = jg. The user collaborative 



Fig. 4: The degree-dependent nearest neighbors' degree, d lm (k), 
as a function of user-degree, k. 



clustering coefficient of the whole network is defined as 
C« = 777 C u (i) , where i runs over all users with de- 
grees larger than 1 and N' denotes the number of these 
users. The degree-dependent collaborative clustering coef- 
ficient, C u (k), is defined as the average collaborative clus- 
tering coefficient over all the fc-degree users. Correspond- 
ing definitions for objects are as following: (i) C (a) = 
d a (d a -i) Si^j s ij I where s tj = j^rgftj is the Jaccard sim- 
ilarity between users i and j; (ii) C a = jp J2 a ^°( a )^ 



where M' denotes the number of objects with degrees 
larger than 1; (iii) C (d) is the average collaborative clus- 
tering coefficient over all the (i-degrce objects. 

Data. — This Letter analyzes two data sets. One is 
downloaded from audioscrobbler.corrj^l in January 2005 by 
Lambiotte and Ausloos [23,24], which consists of a list- 
ing of users, together with the list of music groups the 
users own in their libraries. Detailed information about 
this data set can be found in Refs. [23,24]. The other is a 
random sampling of 10 4 users together with their collected 
bookmarks (URLs) from del.icio.usH in May 2008 [28]. Ta- 
ble 1 summarizes the basic statistics of these two data sets. 

Empirical Results. — Figure 2 reports the degree 
distributions for users, which do not follow cither the 



2 Audioscrobbler.com is a well-known collaborative filtering web 
site that allows user to create the personal web pages as their music 
libraries and to discover new music groups form other users' libraries. 

3 Del.icio.us is one of the most popular social bookmarking web 
sites, which allows users not only to store and organize personal 
bookmarks, but also to look into other users' collections and find 
what they might be interested in. 
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Fig. 5: The degree-dependent nearest neighbors' degree, k lm (d), 
as a function of object-degree, d. 
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power-law form or the exponential form. In fact, they 
lie in between exponential and power-law forms, and can 
be well fitted by the so-called stretched exponential distri- 

whcre ko is 



' k ' 
■fco' 



buttons [31,32], as p(k) ~ fc M "exp 

a constant and < fi < 1 is the characteristic exponent. 
The borderline fi — 1 corresponds to the usual exponen- 
tial distribution. For fi smaller than one, the distribution 
presents a clear curvature in a log-log plot. The exponent 
\x can be determined by considering the cumulative dis- 
tribution P(k) ~ exp — (j-)^ , which can be rewritten 

as log(— logP(fc)) ~ fxlogk. Therefore, Using logfc as 
x-axis and log(— logP(fc)) as y-axis, if the corresponding 
curve can be well fitted by a straight line, then the slope 
equals [i. Accordingly, as shown in Fig. 2, the exponents 
(i for audioscrobbler.com and del.icio.us are 0.76 and 0.66 
respectively. These results have refined the previous statis- 
tics [23], where the exponential function is directly used 
to fit the user degree distribution of audioscrobbler.com. 
As shown in Fig. 3, all the object-degree distributions are 
power laws, as p(d) ~ d~^ . The exponents, (f>, obtained 
by the maximum likelihood estimation [33], arc shown in 
the corresponding figures. 

As shown in Fig. 4 and Fig. 5, for both users and 
objects, the degree is negatively correlated with the aver- 
age nearest neighbors' degree, exhibiting a disassortative 
mixing pattern. This result is in accordance with the user- 
movie bipartite network [26,27], indicating that the fresh 
users tend to view popular objects and the unpopular ob- 
jects are usually collected by very active users. The cor- 



Fig. 6: (Color online) The clustering-degree correlations for 
users. Blue dash lines denote the collaborative clustering coef- 
ficients of the whole networks, C u . The inset displays the early 
decaying behavior of C u {k) for audioscrobbler.com, which can 
be well fitted by an exponential form as C u (k) ~ e -o oo83fe 



relation between d nn and k is stronger than this between 
fc nn and d, which may be caused by the fact that the users 
are active while the objects are passive. 

Table 1 reports the user collaborative clustering co- 
efficients and object collaborative clustering coefficients 
for the whole networks. For comparison, we calculate 
the average user similarity over all user pairs, s~ u = 
N(N-i) Si/j s ioi an d the average object similarity over 
all object pairs, s a = M ^ I _ 1 ^ ^2 a ^p s ap- The connections 
for both users and objects arc considered to be highly 
clustered since C u ^> s a and C ^> s u . The clustering- 
degree correlations for users are reported in Fig. 6. For 
audioscrobbler.com, a remarkable negative correlation for 
small-degree users is observed. Actually, C u (k) decays in 
an exponential form for small k. This result agrees with 
our daily experience that a heavy listener generally has 
broader interests of musi(Q. In contrast, for del.icio.us 
a weakly positive correlation is observed for small-degree 
users. One reason for the difference between audioscrob- 
bler.com and del.icio.us is that the collections in audio- 



4 In the statistical level, the collaborative clustering coefficient 
reflects the diversity of a user's tastes: the higher coefficient corre- 
sponds to the narrower tastes. 
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Fig. 7: (Color online) The clustering-degree correlations for 
objects. Blue dash lines denote the collaborative clustering co- 
efficients of the whole networks, Co- The insets display the 
early decaying behavior of C (d), with the read dash lines de- 
noting the average object degrees. 



scrobblcr.com only reflect the particular tastes of mu- 
sic, while the collections of URLs contain countless topics 
wherein music is just a very small one. In audioscrob- 
bler.com, collections of a heavy listener (i.e., large-degree 
user) usually consist of several music genres, each of which 
contains a considerable number of music groups, while 
most of the music groups collected by a small-degree user 
belong to one genre. However, in del.icio.us, even for a 
very-small-degree user, his/her few collected URLs can be 
of highly diverse topics. Therefore, for del.icio.us, one can 
not infer that a small-degree user has limited interests. 
In addition, collections of music groups are mainly deter- 
mined by personalized interests, while we have checked 
that in del.icio.us, many bookmarks are less personalized, 
that is, they can not well reflect the personal interests of 
users. For example, online tools like translators and search 
engines, and information services webs like the train sched- 
ules and air ticket centers are frequently collected. How- 
ever, till now, we are not fully understood the origins of 
those nontrivial correlations, a future exploration making 
use of content-based or topic-based analysis on the URLs 
may provide a clearer picture. 



Figure 7 reports the clustering-degree correlations for 
objects. For the lower-degree objects, a negative corre- 
lation between the object collaborative clustering coeffi- 
cient and the object degree is observed, which disappears 
at about the average object degree. This result suggests 
that the unpopular objects (i.e., small-degree objects) may 
be more important than indicated by their degrees, since 
the collections of unpopular objects can be considered as 
a good indicator for the common interests-it is not very 
meaningful if two users both select a popular object, while 
if a very unpopular object is simultaneously selected by 
two users, there must be some common tastes shared by 
these two users. In fact, the empirical result clear shows 
that the users commonly collected some unpopular ob- 
jects have much higher similarity to each other than the 
average. The information contained by those small-degree 
objects, usually having little effect in previous algorithms, 
may be utilized for better community detection and infor- 
mation recommendation. 

Conclusion and Discussion. — Today, the explod- 
ing information confronts us with an information over- 
load: we are facing too many alternatives to be able to 
find out what we really need. The collaborative filtering 
web sites provide a promising way to help us in automat- 
ically finding out the relevant objects by analyzing our 
past activities. In principle, all our past activities can be 
stored in the user-object networks (maybe in a weighted 
manner), which play the central role in those online ser- 
vices. This Letter reports the empirical analysis of two 
user-object networks based on the data downloaded from 
audioscrobbler.com and del.icio.us. We found that all the 
object-degree distributions are power-law while the user- 
degree distributions obey stretched exponential functions, 
which refines the previous results [23] . For both users and 
objects, the connections display disassortativc mixing pat- 
terns, in accordance with the observations in user-movie 
networks [26,27]. We proposed a new index, named col- 
laborative clustering coefficient, to quantify the clustering 
behavior based on the collaborative selection. The con- 
nections for both users and objects are considered to be 
highly clustered since the collaborative clustering coeffi- 
cients are much larger than the corresponding background 
similarities. 

A problem closely related to the analysis of web-based 
user-object bipartite networks is how to recommend ob- 
jects to users in a personalized manner [34,35]. The em- 
pirical results reported in this Letter provide some insights 
in the design of recommendation algorithms. For exam- 
ple, as shown in Fig. 4, the average degree of collected ob- 
jects is negatively correlated with the user's degree, and 
the fresh users tend to select very popular objects, that 
is, they have not well established their personalities and 
their collections are mostly popularity-based. This phe- 
nomenon gives an empirical explanation of the so-called 
cold-start problem [36], namely the personalized recom- 
mendations to the very-small-degree users are often inac- 
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curate. In addition, if we compare the significance of the 
user collaborative clustering coefficient, C u /s a , and the 
significance of the object collaborative clustering coeffi- 
cient, C / s u , we will find that for both audioscrobbler.com 
and del.icio.usm, the former (268.07 and 72.84) are much 
larger than the latter (4.11 and 6.79). Therefore, the fact 
that some users have commonly selected an object does 
not imply that they are much more similar to each other 
than two random users, however the objects selected by a 
user are statistically much more similar to each other than 
two random objects. The collaborative filtering techniques 
have two categories in general [34,35]: one is user-based, 
which recommends to the target user the objects collected 
by the users sharing similar tastes; the other is object- 
based, which recommends the objects similar to the ones 
the target user preferred in the past. The comparison be- 
tween C u /s and C /s u indicates that the object-based 
collaborative filtering will perform better, and such a kind 
of comparison can be considered as a helpful evidence be- 
fore the choice between any user-based and object-based 
algorithms [37]. Furthermore, the clustering-degree cor- 
relations reported in Fig. 7 suggest that the small-degree 
objects actually play a more significant role than indicated 
by their degrees. In fact, we have already demonstrated 
that to emphasize the impacts of small-degree objects can 
remarkably enhance the recommendation algorithms' ac- 
curacies [38,39]. We think the further in-depth analysis 
of information contained by the small-degree objects can 
find its applications in the design of more efficient and 
accurate recommendation algorithms. 
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